{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "21cfea06-af98-496a-b13b-106c335a2e65",
   "metadata": {},
   "source": [
    "# Querying options\n",
    "\n",
    "Query vectors can be built from two sources:\n",
    "- a user-input, for example via [natural language interface](https://github.com/superlinked/superlinked/blob/main/notebook/feature/natural_language_querying.ipynb).\n",
    "- a vector already stored in the VDB, usually an item-to-item recommendation, or generally a context vector.\n",
    "\n",
    "there are examples in this notebook for both (and combinations) of these.\n",
    "\n",
    "These inputs (even multiple of each) can be combined and weighted against each other. \n",
    "The number of returned inputs can be controlled via the exact number, or via the minimum required similarity.\n",
    "\n",
    "Let's create a simple example to see the possibilities!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1bf67328-efe5-4c88-9c36-ce2d4b20d89f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install superlinked==37.5.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5a25f839-c109-4447-adbe-0de4ff6f39cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from superlinked import framework as sl\n",
    "\n",
    "pd.set_option(\"display.max_colwidth\", 100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "08f71264-4cf4-405b-9f55-7ea2863c7924",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Paragraph(sl.Schema):\n",
    "    id: sl.IdField\n",
    "    body: sl.String\n",
    "    category: sl.StringList\n",
    "\n",
    "\n",
    "paragraph = Paragraph()\n",
    "\n",
    "body_space = sl.TextSimilaritySpace(text=paragraph.body, model=\"sentence-transformers/all-mpnet-base-v2\")\n",
    "category_space = sl.CategoricalSimilaritySpace(\n",
    "    category_input=paragraph.category, categories=[\"IT\", \"environment\"], uncategorized_as_category=True\n",
    ")\n",
    "paragraph_index = sl.Index([body_space, category_space])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30e0ec2b-540b-4f77-8c7f-96c768f2029a",
   "metadata": {},
   "source": [
    "Now let's add some data to our space and fire up a running executor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "979eaa8a-3cec-4340-bbf2-776f3c79dd67",
   "metadata": {},
   "outputs": [],
   "source": [
    "source: sl.InMemorySource = sl.InMemorySource(paragraph)\n",
    "executor = sl.InMemoryExecutor(sources=[source], indices=[paragraph_index])\n",
    "app = executor.run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0d0e746f-fd42-4364-8453-320cf7c20035",
   "metadata": {},
   "outputs": [],
   "source": [
    "source.put(\n",
    "    [\n",
    "        {\"id\": \"paragraph-1\", \"body\": \"Glorious animals live in the wilderness.\", \"category\": \"environment\"},\n",
    "        {\n",
    "            \"id\": \"paragraph-2\",\n",
    "            \"body\": \"Growing computation power enables advancements in AI.\",\n",
    "            \"category\": \"IT\",\n",
    "        },\n",
    "        {\n",
    "            \"id\": \"paragraph-3\",\n",
    "            \"body\": \"The flora and fauna of a specific habitat highly depend on the weather.\",\n",
    "            \"category\": \"environment\",\n",
    "        },\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "127977d6-2c79-43bb-b41b-94b37f7f9245",
   "metadata": {},
   "source": [
    "## Using the .similar clause\n",
    "\n",
    "Makes us able to supply query input unrelated to the stored vectors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "43fe15fb-2c7b-41fb-9f0c-0aa9d3260a64",
   "metadata": {},
   "outputs": [],
   "source": [
    "# we are creating a Param to reuse the query.\n",
    "# For more info check the `dynamic_parameters.ipynb` feature notebook in this same folder.\n",
    "similar_query = sl.Query(paragraph_index).find(paragraph).similar(body_space, sl.Param(\"similar_input\")).select_all()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "823e5ed2-76a3-4b3e-b7d3-087cf4faead3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>0.337601</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.094036</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>0.044686</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                      body  \\\n",
       "0  The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "1                                 Glorious animals live in the wilderness.   \n",
       "2                    Growing computation power enables advancements in AI.   \n",
       "\n",
       "        category           id  similarity_score  \n",
       "0  [environment]  paragraph-3          0.337601  \n",
       "1  [environment]  paragraph-1          0.094036  \n",
       "2           [IT]  paragraph-2          0.044686  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "similar_result_weather = app.query(similar_query, similar_input=\"rainfall\")\n",
    "sl.PandasConverter.to_pandas(similar_result_weather)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "9b2c9b39-820e-4e43-92fb-c4db619f745b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>0.598644</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.007107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>-0.121885</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                      body  \\\n",
       "0                    Growing computation power enables advancements in AI.   \n",
       "1                                 Glorious animals live in the wilderness.   \n",
       "2  The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "\n",
       "        category           id  similarity_score  \n",
       "0           [IT]  paragraph-2          0.598644  \n",
       "1  [environment]  paragraph-1          0.007107  \n",
       "2  [environment]  paragraph-3         -0.121885  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "similar_result_it = app.query(similar_query, similar_input=\"progress in AI\")\n",
    "sl.PandasConverter.to_pandas(similar_result_it)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c24cdb32-fa33-4fbd-82a6-48527cc8e8d5",
   "metadata": {},
   "source": [
    "## Using the .with_vector clause\n",
    "\n",
    "Provides the opportunity to search with the vector of an object in our database. This is useful for example for recommending items for a user based on it's vector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "a2ed953c-4376-4cdc-a2c8-eba93fdeb622",
   "metadata": {},
   "outputs": [],
   "source": [
    "with_vector_query = sl.Query(paragraph_index).find(paragraph).with_vector(paragraph, \"paragraph-3\", 1.0).select_all()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50822bc5-ea99-48cf-9254-5a1a456dccba",
   "metadata": {},
   "source": [
    "In this case the weight in the clause didn't really matter as there was no other competing clauses. Stay tuned because this is not always the case!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "36ff4b5c-82ae-4555-a2f6-9bad9b4e73f3",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.655296</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>-0.009530</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                      body  \\\n",
       "0  The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "1                                 Glorious animals live in the wilderness.   \n",
       "2                    Growing computation power enables advancements in AI.   \n",
       "\n",
       "        category           id  similarity_score  \n",
       "0  [environment]  paragraph-3          1.000000  \n",
       "1  [environment]  paragraph-1          0.655296  \n",
       "2           [IT]  paragraph-2         -0.009530  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with_vector_result = app.query(with_vector_query)\n",
    "sl.PandasConverter.to_pandas(with_vector_result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d773d9ea-fd4d-4657-8968-5bd2e1c87a07",
   "metadata": {},
   "source": [
    "The first result is the one we are searching with, the second is the more related one, and finally the less connected paragraph body comes.\n",
    "\n",
    "Note however, that with_vector queries can be weighted on a per-space basis as well!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "9a2331d5-631f-4251-9b45-a271e46db36a",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                      body  \\\n",
       "0                                 Glorious animals live in the wilderness.   \n",
       "1  The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "2                    Growing computation power enables advancements in AI.   \n",
       "\n",
       "        category           id  similarity_score  \n",
       "0  [environment]  paragraph-1               1.0  \n",
       "1  [environment]  paragraph-3               1.0  \n",
       "2           [IT]  paragraph-2               0.0  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weight_dict: dict[sl.Space, float] = {body_space: 0.0, category_space: 1.0}\n",
    "with_vector_query_space_weights = (\n",
    "    sl.Query(paragraph_index).find(paragraph).with_vector(paragraph, \"paragraph-3\", weight_dict).select_all()\n",
    ")\n",
    "with_vector_result_space_weights = app.query(with_vector_query_space_weights)\n",
    "sl.PandasConverter.to_pandas(with_vector_result_space_weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf8c61c5-6900-4071-a418-b9c248924076",
   "metadata": {},
   "source": [
    "In the above case as we see the results are only based on the `category` information.\n",
    "\n",
    "While below, only the body of the text influences the similarities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "11bc0d97-f626-42b5-9239-b428ee48b281",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.310591</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>-0.019059</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                      body  \\\n",
       "0  The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "1                                 Glorious animals live in the wilderness.   \n",
       "2                    Growing computation power enables advancements in AI.   \n",
       "\n",
       "        category           id  similarity_score  \n",
       "0  [environment]  paragraph-3          1.000000  \n",
       "1  [environment]  paragraph-1          0.310591  \n",
       "2           [IT]  paragraph-2         -0.019059  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weight_dict_alt: dict[sl.Space, float] = {body_space: 1.0, category_space: 0.0}\n",
    "with_vector_query_space_weights_alt = (\n",
    "    sl.Query(paragraph_index).find(paragraph).with_vector(paragraph, \"paragraph-3\", weight_dict_alt).select_all()\n",
    ")\n",
    "with_vector_result_space_weights_alt = app.query(with_vector_query_space_weights_alt)\n",
    "sl.PandasConverter.to_pandas(with_vector_result_space_weights_alt)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a64368ea-22a8-4d62-913c-1a7b3defffee",
   "metadata": {},
   "source": [
    "## Combine them\n",
    "\n",
    "With the use of weights, creating any combination of inputs is possible. Imagine a situation where we search for a term, `similar_input` in those paragraphs that are relevant to a specific paragraph, denoted by `paragraph_id`. It is possible to weight the input using `input_weight` `Param`, in the relation to the context the search takes place inside using `context_weight` `Param`. Note that the `Param` names are totally arbitrary, the clauses matter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "014cb6dc-9573-498e-8342-bcac7541a860",
   "metadata": {},
   "outputs": [],
   "source": [
    "# we are using dynamic parameters again\n",
    "combined_query = (\n",
    "    sl.Query(paragraph_index)\n",
    "    .find(paragraph)\n",
    "    .similar(body_space, sl.Param(\"similar_body\"), weight=sl.Param(\"similar_body_weight\"))\n",
    "    .with_vector(paragraph, sl.Param(\"paragraph_id\"), weight=sl.Param(\"paragraph_weight\"))\n",
    "    .select_all()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "87d6d6fe-56b5-422e-8bef-c3d9f5d4783c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>0.831307</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.619865</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>0.218673</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                      body  \\\n",
       "0  The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "1                                 Glorious animals live in the wilderness.   \n",
       "2                    Growing computation power enables advancements in AI.   \n",
       "\n",
       "        category           id  similarity_score  \n",
       "0  [environment]  paragraph-3          0.831307  \n",
       "1  [environment]  paragraph-1          0.619865  \n",
       "2           [IT]  paragraph-2          0.218673  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# equal weight\n",
    "combined_result = app.query(\n",
    "    combined_query,\n",
    "    similar_body=\"progress in AI\",\n",
    "    paragraph_id=\"paragraph-3\",\n",
    "    similar_body_weight=1,\n",
    "    paragraph_weight=1,\n",
    ")\n",
    "sl.PandasConverter.to_pandas(combined_result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "590e00b8-0005-40df-bbc9-2eb4593fe726",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>0.984387</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.656062</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>0.065250</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                      body  \\\n",
       "0  The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "1                                 Glorious animals live in the wilderness.   \n",
       "2                    Growing computation power enables advancements in AI.   \n",
       "\n",
       "        category           id  similarity_score  \n",
       "0  [environment]  paragraph-3          0.984387  \n",
       "1  [environment]  paragraph-1          0.656062  \n",
       "2           [IT]  paragraph-2          0.065250  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# upweight context - notice the score differences\n",
    "combined_result_context = app.query(\n",
    "    combined_query,\n",
    "    similar_body=\"progress in AI\",\n",
    "    paragraph_id=\"paragraph-3\",\n",
    "    similar_body_weight=0.25,\n",
    "    paragraph_weight=1,\n",
    ")\n",
    "sl.PandasConverter.to_pandas(combined_result_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "0e9cbf53-4829-49c2-96e7-e9aa81cbb6ad",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.519222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>0.488978</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>0.300537</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                      body  \\\n",
       "0                                 Glorious animals live in the wilderness.   \n",
       "1  The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "2                    Growing computation power enables advancements in AI.   \n",
       "\n",
       "        category           id  similarity_score  \n",
       "0  [environment]  paragraph-1          0.519222  \n",
       "1  [environment]  paragraph-3          0.488978  \n",
       "2           [IT]  paragraph-2          0.300537  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# give more weight to query time input - the most relevant document changes\n",
    "combined_result_input = app.query(\n",
    "    combined_query,\n",
    "    similar_body=\"progress in AI\",\n",
    "    paragraph_id=\"paragraph-3\",\n",
    "    similar_body_weight=1,\n",
    "    paragraph_weight=0.1,\n",
    ")\n",
    "sl.PandasConverter.to_pandas(combined_result_input)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "695f9021-2500-4c82-af73-980a7571dc2e",
   "metadata": {},
   "source": [
    "In order to use per-space weights, the dict structure has to be in place and the actual values can be `Param`s."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "0657feb0-04fe-4f09-a0da-add0726ae3d5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.527039</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>0.514157</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>0.300010</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                      body  \\\n",
       "0                                 Glorious animals live in the wilderness.   \n",
       "1  The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "2                    Growing computation power enables advancements in AI.   \n",
       "\n",
       "        category           id  similarity_score  \n",
       "0  [environment]  paragraph-1          0.527039  \n",
       "1  [environment]  paragraph-3          0.514157  \n",
       "2           [IT]  paragraph-2          0.300010  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we are using dynamic parameters again\n",
    "combined_query_dict_context_weights = (\n",
    "    sl.Query(paragraph_index)\n",
    "    .find(paragraph)\n",
    "    .similar(body_space, sl.Param(\"similar_body\"), weight=sl.Param(\"similar_body_weight\"))\n",
    "    .with_vector(\n",
    "        paragraph,\n",
    "        sl.Param(\"paragraph_id\"),\n",
    "        weight={body_space: sl.Param(\"body_paragraph_weight\"), category_space: sl.Param(\"category_paragraph_weight\")},\n",
    "    )\n",
    "    .select_all()\n",
    ")\n",
    "# we can even use specific weights for context, too as seen before\n",
    "combined_result_input = app.query(\n",
    "    combined_query_dict_context_weights,\n",
    "    similar_body=\"progress in AI\",\n",
    "    paragraph_id=\"paragraph-3\",\n",
    "    similar_body_weight=1,\n",
    "    body_paragraph_weight=0.15,\n",
    "    category_paragraph_weight=0.05,\n",
    ")\n",
    "sl.PandasConverter.to_pandas(combined_result_input)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac1ce47c-f620-43b7-a4f9-86bab5a99d9e",
   "metadata": {},
   "source": [
    "## Advanced query mechanics\n",
    "\n",
    "Query behavior can be influenced using **weights**, which control how similarity is computed across and within vector spaces. There are two types of weights:\n",
    "\n",
    "### 1. Space Weights \n",
    "\n",
    "**Space weights** determine the relative contribution of each vector space to the overall similarity score between the query and knowledge base items. These weights govern **inter-space importance**, reweighting the normalized query vector components coming from each space. \n",
    "\n",
    "### 2. Clause Weights \n",
    "\n",
    "**Clause weights** affect how individual clauses contribute to the formation of a query vector within a specific space, controlling **intra-space influence**. Both `.similar` and `.with_vector` clauses support weights: \n",
    "* `.similar` clauses apply to a single space.\n",
    "* `.with_vector` clauses apply across all vector spaces.\n",
    "\n",
    "Clause weights influence how the query vector is constructed per space. After this, space weights are applied to normalized per-space vectors to adjust the overall balance across spaces."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "d41f0d29-384b-4435-98d1-4201afe32a9f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# let's add 2 more examples for demonstration purposes\n",
    "source.put(\n",
    "    [\n",
    "        {\n",
    "            \"id\": \"paragraph-4\",\n",
    "            \"body\": \"The AI boom contributes to global warming through heating caused by extensive GPU usage.\",\n",
    "            \"category\": [\"IT\", \"environment\"],\n",
    "        },\n",
    "        {\n",
    "            \"id\": \"paragraph-5\",\n",
    "            \"body\": \"An astonishing number of users still use ancient Windows operating systems.\",\n",
    "            \"category\": [\"IT\"],\n",
    "        },\n",
    "    ]\n",
    ")\n",
    "\n",
    "\n",
    "# helper function for partial scores\n",
    "def get_partial_score_df(result: sl.QueryResult) -> pd.DataFrame:\n",
    "    partial_score_df: pd.DataFrame = pd.DataFrame(\n",
    "        [[entry.id] + list(entry.metadata.partial_scores) for entry in result.entries],\n",
    "        columns=[\"id\", \"body_space\", \"category_space\"],\n",
    "    )\n",
    "    return sl.PandasConverter.to_pandas(result).merge(partial_score_df, on=\"id\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3e0a2a3-b327-40ad-a3f2-316f27c570c6",
   "metadata": {},
   "source": [
    "We will use partial scores to explain our results a bit more - the above function is a helper for that. To a more focused look on partial scores, take a look at the relevant [feature notebook](https://github.com/superlinked/superlinked/blob/main/notebook/feature/query_result.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "18c39e10-a835-418f-b2b1-e98138d0c462",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a complicated query to showcase the levers we can pull to affect query results in their entirety\n",
    "advanced_query = (\n",
    "    sl.Query(\n",
    "        paragraph_index,\n",
    "        weights={body_space: sl.Param(\"body_space_weight\"), category_space: sl.Param(\"category_space_weight\")},\n",
    "    )\n",
    "    .find(paragraph)\n",
    "    .similar(body_space, sl.Param(\"body_input\"), sl.Param(\"body_similar_weight\"))\n",
    "    .similar(category_space, sl.Param(\"category_input\"), sl.Param(\"category_similar_weight\"))\n",
    "    .with_vector(\n",
    "        paragraph,\n",
    "        sl.Param(\"paragraph_id\"),\n",
    "        weight={\n",
    "            body_space: sl.Param(\"body_with_vector_weight\"),\n",
    "            category_space: sl.Param(\"category_with_vector_weight\"),\n",
    "        },\n",
    "    )\n",
    "    .select_all()\n",
    "    .include_metadata()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "bcd296a2-85c4-439b-ae5b-404aaa319291",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "      <th>body_space</th>\n",
       "      <th>category_space</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The AI boom contributes to global warming through heating caused by extensive GPU usage.</td>\n",
       "      <td>[IT, environment]</td>\n",
       "      <td>paragraph-4</td>\n",
       "      <td>0.663999</td>\n",
       "      <td>0.163999</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>An astonishing number of users still use ancient Windows operating systems.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-5</td>\n",
       "      <td>0.657851</td>\n",
       "      <td>0.491184</td>\n",
       "      <td>0.166667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.383819</td>\n",
       "      <td>0.050486</td>\n",
       "      <td>0.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>0.333975</td>\n",
       "      <td>0.000642</td>\n",
       "      <td>0.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>0.277853</td>\n",
       "      <td>0.111186</td>\n",
       "      <td>0.166667</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                       body  \\\n",
       "0  The AI boom contributes to global warming through heating caused by extensive GPU usage.   \n",
       "1               An astonishing number of users still use ancient Windows operating systems.   \n",
       "2                                                  Glorious animals live in the wilderness.   \n",
       "3                   The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "4                                     Growing computation power enables advancements in AI.   \n",
       "\n",
       "            category           id  similarity_score  body_space  \\\n",
       "0  [IT, environment]  paragraph-4          0.663999    0.163999   \n",
       "1               [IT]  paragraph-5          0.657851    0.491184   \n",
       "2      [environment]  paragraph-1          0.383819    0.050486   \n",
       "3      [environment]  paragraph-3          0.333975    0.000642   \n",
       "4               [IT]  paragraph-2          0.277853    0.111186   \n",
       "\n",
       "   category_space  \n",
       "0        0.500000  \n",
       "1        0.166667  \n",
       "2        0.333333  \n",
       "3        0.333333  \n",
       "4        0.166667  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# let's first run a query where we up-weight the with_vector part of the body_space query vector part\n",
    "with_vector_favored_result = app.query(\n",
    "    advanced_query,\n",
    "    body_space_weight=1,\n",
    "    category_space_weight=1,\n",
    "    body_input=\"computation power\",\n",
    "    category_input=\"environment\",\n",
    "    body_similar_weight=1,\n",
    "    category_similar_weight=1,\n",
    "    paragraph_id=\"paragraph-5\",\n",
    "    body_with_vector_weight=5,\n",
    "    category_with_vector_weight=1,\n",
    ")\n",
    "\n",
    "get_partial_score_df(with_vector_favored_result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0b57e4c-10e1-4ad5-a7bb-ebc7e9caf984",
   "metadata": {},
   "source": [
    "Notice how the input from the context vector (Windows operating systems, paragraph-5) makes paragraph-5 rank high up due to the `body_with_vector_weight` Param being significantly higher than `body_similar_weight` (the top result is mainly there due to the categorical match). \n",
    "\n",
    "In the next scenario, where the above mentioned Params' relationship is switched, paragraph-2 ranks high (computation power) due to it being semantically closer to `body_input`. Also notice that the relative relationship of these params matter: `5` vs `1` is the **same** as `1` vs `0.2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "c3681102-a904-43e9-add5-708b035e02fa",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "      <th>body_space</th>\n",
       "      <th>category_space</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The AI boom contributes to global warming through heating caused by extensive GPU usage.</td>\n",
       "      <td>[IT, environment]</td>\n",
       "      <td>paragraph-4</td>\n",
       "      <td>0.681913</td>\n",
       "      <td>0.181913</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>0.410214</td>\n",
       "      <td>0.243547</td>\n",
       "      <td>0.166667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.351866</td>\n",
       "      <td>0.018533</td>\n",
       "      <td>0.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>An astonishing number of users still use ancient Windows operating systems.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-5</td>\n",
       "      <td>0.344295</td>\n",
       "      <td>0.177629</td>\n",
       "      <td>0.166667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>0.316096</td>\n",
       "      <td>-0.017237</td>\n",
       "      <td>0.333333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                       body  \\\n",
       "0  The AI boom contributes to global warming through heating caused by extensive GPU usage.   \n",
       "1                                     Growing computation power enables advancements in AI.   \n",
       "2                                                  Glorious animals live in the wilderness.   \n",
       "3               An astonishing number of users still use ancient Windows operating systems.   \n",
       "4                   The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "\n",
       "            category           id  similarity_score  body_space  \\\n",
       "0  [IT, environment]  paragraph-4          0.681913    0.181913   \n",
       "1               [IT]  paragraph-2          0.410214    0.243547   \n",
       "2      [environment]  paragraph-1          0.351866    0.018533   \n",
       "3               [IT]  paragraph-5          0.344295    0.177629   \n",
       "4      [environment]  paragraph-3          0.316096   -0.017237   \n",
       "\n",
       "   category_space  \n",
       "0        0.500000  \n",
       "1        0.166667  \n",
       "2        0.333333  \n",
       "3        0.166667  \n",
       "4        0.333333  "
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "similar_favored_result = app.query(\n",
    "    advanced_query,\n",
    "    body_space_weight=1,\n",
    "    category_space_weight=1,\n",
    "    body_input=\"computation power\",\n",
    "    category_input=\"environment\",\n",
    "    body_similar_weight=1,\n",
    "    category_similar_weight=1,\n",
    "    paragraph_id=\"paragraph-5\",\n",
    "    body_with_vector_weight=0.2,\n",
    "    category_with_vector_weight=1,\n",
    ")\n",
    "\n",
    "get_partial_score_df(similar_favored_result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4b54a2f-339a-43bb-af8d-7e919e0a78b2",
   "metadata": {},
   "source": [
    "But after all, regardless of the individual clause weights, the final similarity is driven predominantly by the space weights. Notice how increasing `category_space_weight` changed the landscape making the paragraphs with environment category rank the highest. This shift can be observed through the partial scores as well - the category space score drives most of the overall score."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "adac5307-db45-4707-922e-45294952e206",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "      <th>body_space</th>\n",
       "      <th>category_space</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The AI boom contributes to global warming through heating caused by extensive GPU usage.</td>\n",
       "      <td>[IT, environment]</td>\n",
       "      <td>paragraph-4</td>\n",
       "      <td>0.748332</td>\n",
       "      <td>0.054957</td>\n",
       "      <td>0.693375</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.473216</td>\n",
       "      <td>0.010965</td>\n",
       "      <td>0.462250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>0.459614</td>\n",
       "      <td>-0.002637</td>\n",
       "      <td>0.462250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>An astonishing number of users still use ancient Windows operating systems.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-5</td>\n",
       "      <td>0.337383</td>\n",
       "      <td>0.106258</td>\n",
       "      <td>0.231125</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Growing computation power enables advancements in AI.</td>\n",
       "      <td>[IT]</td>\n",
       "      <td>paragraph-2</td>\n",
       "      <td>0.287483</td>\n",
       "      <td>0.056358</td>\n",
       "      <td>0.231125</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                       body  \\\n",
       "0  The AI boom contributes to global warming through heating caused by extensive GPU usage.   \n",
       "1                                                  Glorious animals live in the wilderness.   \n",
       "2                   The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "3               An astonishing number of users still use ancient Windows operating systems.   \n",
       "4                                     Growing computation power enables advancements in AI.   \n",
       "\n",
       "            category           id  similarity_score  body_space  \\\n",
       "0  [IT, environment]  paragraph-4          0.748332    0.054957   \n",
       "1      [environment]  paragraph-1          0.473216    0.010965   \n",
       "2      [environment]  paragraph-3          0.459614   -0.002637   \n",
       "3               [IT]  paragraph-5          0.337383    0.106258   \n",
       "4               [IT]  paragraph-2          0.287483    0.056358   \n",
       "\n",
       "   category_space  \n",
       "0        0.693375  \n",
       "1        0.462250  \n",
       "2        0.462250  \n",
       "3        0.231125  \n",
       "4        0.231125  "
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "space_weight_overpowers = app.query(\n",
    "    advanced_query,\n",
    "    body_space_weight=1,\n",
    "    category_space_weight=5,\n",
    "    body_input=\"computation power\",\n",
    "    category_input=\"environment\",\n",
    "    body_similar_weight=5,\n",
    "    category_similar_weight=1,\n",
    "    paragraph_id=\"paragraph-5\",\n",
    "    body_with_vector_weight=5,\n",
    "    category_with_vector_weight=1,\n",
    ")\n",
    "\n",
    "get_partial_score_df(space_weight_overpowers)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d6e49ea-4e6e-46e8-8605-e13f93d14772",
   "metadata": {},
   "source": [
    "## Filter results based on score or position"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "2332b7d5-65f2-4c54-978e-966f7e7e1a13",
   "metadata": {},
   "outputs": [],
   "source": [
    "# let's use combined query above with some preset params\n",
    "params = {\n",
    "    \"similar_body\": \"progress in AI\",\n",
    "    \"paragraph_id\": \"paragraph-3\",\n",
    "    \"similar_body_weight\": 1,\n",
    "    \"paragraph_weight\": 0.25,\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "ed00fb14-cf99-4433-ab38-ea91af416966",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The AI boom contributes to global warming through heating caused by extensive GPU usage.</td>\n",
       "      <td>[IT, environment]</td>\n",
       "      <td>paragraph-4</td>\n",
       "      <td>0.693285</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>0.564008</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                       body  \\\n",
       "0  The AI boom contributes to global warming through heating caused by extensive GPU usage.   \n",
       "1                   The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "\n",
       "            category           id  similarity_score  \n",
       "0  [IT, environment]  paragraph-4          0.693285  \n",
       "1      [environment]  paragraph-3          0.564008  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# return top 2 items\n",
    "combined_query_limit_result = app.query(combined_query.limit(2), **params)\n",
    "sl.PandasConverter.to_pandas(combined_query_limit_result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "a14f155c-09c4-4cc9-b581-bf337011751d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>body</th>\n",
       "      <th>category</th>\n",
       "      <th>id</th>\n",
       "      <th>similarity_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The AI boom contributes to global warming through heating caused by extensive GPU usage.</td>\n",
       "      <td>[IT, environment]</td>\n",
       "      <td>paragraph-4</td>\n",
       "      <td>0.693285</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>The flora and fauna of a specific habitat highly depend on the weather.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-3</td>\n",
       "      <td>0.564008</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Glorious animals live in the wilderness.</td>\n",
       "      <td>[environment]</td>\n",
       "      <td>paragraph-1</td>\n",
       "      <td>0.542344</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                       body  \\\n",
       "0  The AI boom contributes to global warming through heating caused by extensive GPU usage.   \n",
       "1                   The flora and fauna of a specific habitat highly depend on the weather.   \n",
       "2                                                  Glorious animals live in the wilderness.   \n",
       "\n",
       "            category           id  similarity_score  \n",
       "0  [IT, environment]  paragraph-4          0.693285  \n",
       "1      [environment]  paragraph-3          0.564008  \n",
       "2      [environment]  paragraph-1          0.542344  "
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# return items with scores larger than 0.5\n",
    "combined_query_radius_result = app.query(combined_query.radius(0.5), **params)\n",
    "sl.PandasConverter.to_pandas(combined_query_radius_result)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "superlinked-py3.11",
   "language": "python",
   "name": "superlinked-py3.11"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
